[TEP-0104]: Support Task-level resource limits #703

lbernick · 2022-05-11T13:50:53Z

After further reflection on this proposal, I realized pod effective resource limits
are not used for scheduling, and limits are enforced on individual containers by the
container runtime. This commit updates TEP-0104 to support Task-level resource limits
as a result. It also changes the behavior of how Task-level requirements interact with
Step-level requirements and with Sidecars as a result.

lbernick · 2022-05-11T13:52:16Z

/assign @vdemeester @jerop

FYI @leeonlee @austinzhao-go

austinzhao-go · 2022-05-11T16:02:57Z

added checks for the update (with limit) as this thread

teps/0104-tasklevel-resource-requirements.md

austinzhao-go · 2022-05-12T15:41:25Z

teps/0104-tasklevel-resource-requirements.md

@@ -118,63 +118,69 @@ spec:
  resources:


a check for the augment on the existing Task.spec.resources field which contains inputs and outputs for TaskResources

thinking perhaps the context will be a bit diff - as inputs/outputs for running the actual build task while the added limits/requests for running the beneath Pod. but after checking other existing resources fields, like Task.Steps.* etc, think still should be a better choice to keep the same field name and add the limits and requests, as for a consistent naming on the same funcs.

Yeah, unfortunately resources is an overloaded word here. However, I agree that it makes sense to use it in this context, because it is the terminology Kubernetes uses to describe compute resources, and luckily PipelineResources will be going away which should lessen the confusion.

Hmmm...that's unfortunate...would the limits and requests fields be added under the current resources field?

ah I see the problem, this is definitely annoying but I'm hoping naming bikeshedding won't block this PR-- maybe we could call it "compute" or something

an updates from implementation... tentatively I keep the added requests and limits under the mentioned resources field for this moment
https://github.com/tektoncd/pipeline/pull/4877/files?diff=unified&w=0#diff-f67e6d3007cea84a7a2e6301beb54c39b96b75ac9e57d888cbf5cc86c57510f3R77

thinking these reasons:

a consistent naming for (container computing) resources requirements as being used in all other resources field, like Task.spec.step.resources, Task.spec.stepTemplate.resources

comments on code level and docs updates can be done to clear the minor(but meaningful) context diffs between resources for building steps and container computing resources

It's definitely less than ideal but I think it's ok given that we'll be removing pipelineresources (and better than the alternatives). thanks for the update Austin!

note: we discussed this briefly during last week API working group, and even though the resources field (PipelineResource) is going away, re-using the same field name is going to create a lot of confusion. We may have to use a more explicit field here even (resource_requirement or something)

Hm yeah I think this needs more discussion. How would you feel about merging this PR discussing behavior changes, and I'll open a subsequent one specifically for naming? I added a note here mentioning the name conflict with pipelineresources.

lbernick · 2022-05-12T17:37:31Z

/assign @dibyom

dibyom · 2022-05-12T17:48:53Z

teps/0104-tasklevel-resource-requirements.md

@@ -118,63 +118,69 @@ spec:
  resources:


Hmmm...that's unfortunate...would the limits and requests fields be added under the current resources field?

dibyom · 2022-05-12T17:50:28Z

teps/0104-tasklevel-resource-requirements.md

-Therefore, this proposal will focus only on task-level resource requests, not limits.
+However, the effective resource limit of a pod are not used for scheduling (see
+[How Pods with resource requests are scheduled][scheduling] and [How Kubernetes applies resource requests and limits][enforcement]).
+Instead, container limits are enforced by the container runtime.


Question: What happens when the container runtime enforces the limit? Do we currently handle that error somehow? Or does the taskrun just time out?

working on the implementation, perhaps can help with some my understanding...

task-level limits will be applied on each step, so each step/container gets enforced with the required limits. Although the task/pod-level will have a larger summed up limits amount(as from each step), the container runtime still follows the limits over the whole process -- due to the sequentially executed steps under Tekton context.

some lines I updated in docs, perhaps help a bit also
https://github.com/tektoncd/pipeline/pull/4877/files?diff=unified&w=0#diff-5007e9db17cf9b70b930577bd627581e10e754347e92e749f5d58614456d7643R82

over-limits (task-level) will be checked in a helper func under reconcile logic and throw an error to fail TaskRun

an example like the current over-limits checks for step-level as
validateTaskSpecRequestResources()
also the checks implementation

Discussed in tektoncd/pipeline#4930

teps/0104-tasklevel-resource-requirements.md

pritidesai · 2022-05-16T16:06:44Z

/kind tep

lbernick · 2022-05-20T12:29:51Z

@vdemeester @jerop please take a look when you have the time! (You can also unassign yourselves if you want :) )

lbernick · 2022-05-31T17:48:16Z

teps/0104-tasklevel-resource-requirements.md

+We should consider deprecating `Task.Step.Resources`, `Task.StepTemplate.Resources`, and `TaskRun.StepOverrides`.
+Specifying resource requirements for individual Steps is confusing and likely too granular for many CI/CD workflows.
+
+We could also consider support for both Task-level and Step-level resource requirements if the requirements are for different types


@austinzhao-go updated the tep to clarify

We could also consider support for both Task-level and Step-level resource requirements if the requirements are for different types of compute resources (for example, specifying CPU request at the Step level and memory request at the Task level). However, this functionality will not be supported by the initial implementation of this proposal; it can be added later if desired.

pritidesai · 2022-06-06T16:09:04Z

API WG - @vdemeester looking for approval please!

vdemeester · 2022-06-07T08:58:41Z

teps/0104-tasklevel-resource-requirements.md

@@ -118,63 +118,69 @@ spec:
  resources:


note: we discussed this briefly during last week API working group, and even though the resources field (PipelineResource) is going away, re-using the same field name is going to create a lot of confusion. We may have to use a more explicit field here even (resource_requirement or something)

teps/0104-tasklevel-resource-requirements.md

tekton-robot · 2022-06-07T19:18:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dibyom, jerop, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~teps/OWNERS~~ [dibyom,jerop,vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

After further reflection on this proposal, I realized pod effective resource limits are not used for scheduling, and limits are enforced on individual containers by the container runtime. This commit updates TEP-0104 to support Task-level resource limits as a result. It also changes the behavior of how Task-level requirements interact with Step-level requirements and with Sidecars as a result. Lastly, it removes the ability to specify both Step-level requirements and Task-level requirements if the resource types are different, and moves this functionality to potential future work.

jerop · 2022-06-08T22:05:25Z

all assignees agreed to merge this offline - so merging now - cc @dibyom @vdemeester

/lgtm

tekton-robot requested review from danielhelfand and PuneetPunamiya May 11, 2022 13:50

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 11, 2022

tekton-robot assigned jerop and vdemeester May 11, 2022

lbernick mentioned this pull request May 11, 2022

FR: Task-level (and maybe Pipeline-level) resource requests and limits tektoncd/pipeline#4470

Closed

austinzhao-go reviewed May 11, 2022

View reviewed changes

teps/0104-tasklevel-resource-requirements.md Show resolved Hide resolved

austinzhao-go reviewed May 12, 2022

View reviewed changes

lbernick force-pushed the tep-0104 branch from 53a9d1d to b606150 Compare May 12, 2022 17:35

tekton-robot assigned dibyom May 12, 2022

dibyom approved these changes May 12, 2022

View reviewed changes

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2022

lbernick force-pushed the tep-0104 branch from b606150 to d9f73bc Compare May 12, 2022 18:05

austinzhao-go reviewed May 13, 2022

View reviewed changes

teps/0104-tasklevel-resource-requirements.md Show resolved Hide resolved

tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label May 16, 2022

austinzhao-go mentioned this pull request May 25, 2022

[TEP-0104] Support Task-level Resource Requirements for TaskRun: Part #1 Fields Addition & Validation w/ Docs Updates tektoncd/pipeline#4877

Merged

11 tasks

lbernick force-pushed the tep-0104 branch from d9f73bc to 627bad8 Compare May 31, 2022 17:47

lbernick commented May 31, 2022

View reviewed changes

austinzhao-go mentioned this pull request Jun 2, 2022

How does Tekton behave if OOM(Out of Memory)? tektoncd/pipeline#4930

Closed

vdemeester approved these changes Jun 7, 2022

View reviewed changes

jerop approved these changes Jun 7, 2022

View reviewed changes

teps/0104-tasklevel-resource-requirements.md Outdated Show resolved Hide resolved

teps/0104-tasklevel-resource-requirements.md Outdated Show resolved Hide resolved

lbernick force-pushed the tep-0104 branch from 627bad8 to cdfe6c1 Compare June 7, 2022 20:34

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2022

tekton-robot merged commit 9a5f2b5 into tektoncd:main Jun 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEP-0104]: Support Task-level resource limits #703

[TEP-0104]: Support Task-level resource limits #703

lbernick commented May 11, 2022

lbernick commented May 11, 2022

austinzhao-go commented May 11, 2022

austinzhao-go May 12, 2022

lbernick May 12, 2022

dibyom May 12, 2022

lbernick May 12, 2022

austinzhao-go May 24, 2022

lbernick May 25, 2022

vdemeester Jun 7, 2022

lbernick Jun 7, 2022

lbernick commented May 12, 2022

dibyom May 12, 2022

dibyom May 12, 2022

austinzhao-go May 24, 2022

austinzhao-go May 24, 2022 •

edited

Loading

lbernick Jun 2, 2022

pritidesai commented May 16, 2022

lbernick commented May 20, 2022

lbernick May 31, 2022

pritidesai commented Jun 6, 2022

vdemeester Jun 7, 2022

tekton-robot commented Jun 7, 2022

jerop commented Jun 8, 2022 •

edited

Loading

[TEP-0104]: Support Task-level resource limits #703

[TEP-0104]: Support Task-level resource limits #703

Conversation

lbernick commented May 11, 2022

lbernick commented May 11, 2022

austinzhao-go commented May 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbernick commented May 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

austinzhao-go May 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pritidesai commented May 16, 2022

lbernick commented May 20, 2022

Choose a reason for hiding this comment

pritidesai commented Jun 6, 2022

Choose a reason for hiding this comment

tekton-robot commented Jun 7, 2022

jerop commented Jun 8, 2022 • edited Loading

austinzhao-go May 24, 2022 •

edited

Loading

jerop commented Jun 8, 2022 •

edited

Loading